Systemd free setup #507

cbosdo · 2024-11-22T15:56:40Z

What does this PR change?

Change the setup to run in a separate pod or Kubernetes job.

Codespace

Check if you already have a running container clicking on

Test coverage

No tests: requires end to end tests
DONE

Links

Issue(s): #

DONE

Changelogs

Make sure the changelogs entries you are adding are compliant with https://github.com/uyuni-project/uyuni/wiki/Contributing#changelogs and https://github.com/uyuni-project/uyuni/wiki/Contributing#uyuni-projectuyuni-repository

If you don't need a changelog check, please mark this checkbox:

No changelog needed

If you uncheck the checkbox after the PR is created, you will need to re-run changelog_test (see below)

Before you merge

Check How to branch and merge properly!

During a migration to kubernetes the server is deployed after the rsync to prepare the SSL secrets and PVC. This has the nasty effect to corrupt the synchronized data with a too recent catalog version ID. This would let the DB migration to fail starting the old postgresql server. To workaround this, move the data to the the backup place after the rsync instead of the begining of the db upgrade.

After the k8s migration the pod has been started again since the initial connection creation. We need to reset the connection to not search for the old pod name.

Some pods require a long time to run. This is the case for the DB upgrade finalization that runs a potentially long reindex.

Migration deploys the helm chart multiple times. Identifying which revision corresponds to which step in helm history is hard without a description.

…allCRDs

Of of the issuers creation function had two distinct behaviors and this was only generating confusion when reading the whole code. This function has been split and some useless intermediary functions have been merged. This with better function naming should make the SSL setup code more understandable.

In the kubernetes world we need to link the ports to services. For now we only have a TCP and an UDP service for server and the same for proxy, but in the short term, we will need more services to allow splitting into multiple pods. This refactoring is preparing this split.

Running commands in a running container only works if there is a running container and is harder to unit test. In order to help sharing code for Kubernetes, the SanityCheck now gets the existing deployment version with inspecting its image. This also helps adding unit tests for those checks.

In order to later share code between those 3 very similar commands, we need to share the parameters data structure.

Migration to kubernetes is rather fragile, with: 1. tasks running in `kubectl exec` or as `pod`. 2. the uyuni helm chart being deployed multiple times 3. `hostPath` mounts are used everywhere for the scripts to run and data to read and force the script to run on the cluster node. Here are the solutions to those problems: 1. Each step will run as a Job and those won't be deleted automatically for the user to access their logs after. 2. Stop using the helm chart and deploy the resources when we need them. This will allow more control of what runs when and reduces the number of useless starts of the giant container. Postgresql DB upgrade will disable SSL temporarily in the postgresql.conf in order to not rely on the SSL certificates to be migrated. 3. The scripts to run for each step will be passed directly as `sh -c` parameter to the generated Jobs. The migration data are be stored in a special volume and not on the host. As a collateral, SSH agent can no longer be used as that would require running on a cluster node again. The tool now creates a ConfigMap to store the SSH config and known_hosts and a Secret for a passwordless SSH key. The PersistentVolumes are not destroyed after the end of the first job and are then reused by the next ones and the final deployment. Using Kubernetes API modules also helps for code reuse with a future operator. Note that the old postgresql database cannot be moved to a separate PersistentVolumes. As we run a `db_upgrade --link`, the old database is linked by the new one and cannot be disposed of.

In order to share the same code for installation, migration and upgrade the RunSetup() function needs to move to the mgradm shared utils module.

Remove all server resources without relying on the helm chart.

Refactor upgrade and install of the server to no longer need the helm chart as initiated for the migration, but merge all those logics into a single Reconcile() function to avoid redundancy. Merging the code into a single function will also help figuring out how to implement an operator in the future.

There is no need to run a potentially lengthy reindexing on minor upgrades, only on major ones. Don't call su with `-` parameter as it shows the warning message for terminals... and that looks ugly in logs.

Since helm is no longer used installing Uyuni, but only cert-manager, rename the flags. Also drop those that are no longer used for the server after the refactoring.

With CGO enabled there are include problems on that architecture and that would probably require cross-compiling.

Traefik helm chart changed the structure of the expose property starting version 27. Read the chart version from the trafik.yaml file and write the config accordingly.

Running the first user creation from outside the container relies on the pod to be seen as ready by kubernetes... and sometimes it takes longer than others. Calling the API from the setup script inside the container allows to use localhost and not rely on ingress to route the request.

During the installation, there was a message indicating that the timezone from the host couldn't be set in the container. This was due to no removing the line end from the command output.

In the kubernetes world, running the setup as an exec is really dirty as we can't have it in an operator or helm chart. This commits benefits from the setup script not needing systemd to run as PID1 to move the setup in a separate container.

In some cases I had the SSL key changed between the setup container and the real one and the postgresql key had to be copied to fix the DB setup.

nadvornik and others added 25 commits November 13, 2024 10:17

Kubernetes support for Hub XML-RPC

c39b6e5

Refresh the connection after the k8s migration

e9a44a0

After the k8s migration the pod has been started again since the initial connection creation. We need to reset the connection to not search for the old pod name.

Wait for 3 hours when running a pod

e3b2544

Some pods require a long time to run. This is the case for the DB upgrade finalization that runs a potentially long reindex.

Add deployments reason for helm history

7560bc1

Migration deploys the helm chart multiple times. Identifying which revision corresponds to which step in helm history is hard without a description.

Add crds.keep value for cert-manager to keep feature parity with inst…

38864fb

…allCRDs

Use one data structure for install, migrate, upgrade flags

475be63

In order to later share code between those 3 very similar commands, we need to share the parameters data structure.

Move the RunSetup function to shared

0a7d697

In order to share the same code for installation, migration and upgrade the RunSetup() function needs to move to the mgradm shared utils module.

Refactor kubernetes uninstall

92c7c29

Remove all server resources without relying on the helm chart.

Migration script improvements

bf0def5

There is no need to run a potentially lengthy reindexing on minor upgrades, only on major ones. Don't call su with `-` parameter as it shows the warning message for terminals... and that looks ugly in logs.

Change the --helm-* parameters into --kubernetes-*

5c44006

Since helm is no longer used installing Uyuni, but only cert-manager, rename the flags. Also drop those that are no longer used for the server after the refactoring.

Disable CGO build for Debian i586 to avoid cross-compiling

1409477

With CGO enabled there are include problems on that architecture and that would probably require cross-compiling.

Handle traefik helm chart breaking change to expose ports

3a48af0

Traefik helm chart changed the structure of the expose property starting version 27. Read the chart version from the trafik.yaml file and write the config accordingly.

Remove the line end in the local timezone

f65f29a

During the installation, there was a message indicating that the timezone from the host couldn't be set in the container. This was due to no removing the line end from the command output.

Remove unused code after kubernetes refactoring

397f3c7

Copy the SSL key at each start of the container

b989ad3

In some cases I had the SSL key changed between the setup container and the real one and the postgresql key had to be copied to fix the DB setup.

Align the kubernetes labels with the common ones.

434a0d6

fixup! Kubernetes migration refactoring

a87677c

cbosdo marked this pull request as draft November 22, 2024 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systemd free setup #507

Systemd free setup #507

cbosdo commented Nov 22, 2024

Systemd free setup #507

Are you sure you want to change the base?

Systemd free setup #507

Conversation

cbosdo commented Nov 22, 2024

What does this PR change?

Codespace

Test coverage

Links

Changelogs

Before you merge